{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problems"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 32-1 String matching based on repetition factors\n",
    "\n",
    "> Let $y^i$ denote the concatenation of string $y$ with itself $i$ times. For example, $(\\text{ab})^3=\\text{ababab}$. We say that a string $x \\in \\Sigma^*$ has __*repetition factor*__ $r$ if $x = y ^ r$ for some string $y \\in \\Sigma^*$ and some $r > 0$. Let $\\rho$ denote the largest $r$ such that $x$ has repetition factor $r$.\n",
    "\n",
    "> __*a*__. Give an efficient algorithm that takes as input a pattern $P[1 \\dots m]$ and computes the value $\\rho(P_i)$ for $i = 1, 2, \\dots, m$. What is the running time of your algorithm?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compute $\\pi$, let $l = m - \\pi[m]$, if $m ~\\text{mod}~ l = 0$ and for all $p = m - i \\cdot l > 0$, $p - \\pi[p] = l$, then $\\rho(P_i) = m / l$, otherwise $\\rho(P_i) = 1$.  The running time is $\\Theta(n)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*b*__. For any pattern $P[1 \\dots m]$, let $\\rho^*(P)$ be defined as $\\max_{1 \\le i \\le m} \\rho(P_i)$. Prove that if the pattern $P$ is chosen randomly from the set of all binary strings of length $m$, then the expected value of $\\rho^*(P)$ is $O(1)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$P(\\rho^*(P) \\ge 2)=\\frac{1}{2} + \\frac{1}{8} + \\frac{1}{32} + \\cdots \\approx \\frac{2}{3}$\n",
    "\n",
    "$P(\\rho^*(P) \\ge 3)=\\frac{1}{4} + \\frac{1}{32} + \\frac{1}{256} + \\cdots \\approx \\frac{2}{7}$\n",
    "\n",
    "$P(\\rho^*(P) \\ge 4)=\\frac{1}{8} + \\frac{1}{128} + \\frac{1}{2048} + \\cdots \\approx \\frac{2}{15}$\n",
    "\n",
    "$P(\\rho^*(P) = 1) = \\frac{1}{3}$\n",
    "\n",
    "$P(\\rho^*(P) = 2) = \\frac{8}{21}$\n",
    "\n",
    "$P(\\rho^*(P) = 3) = \\frac{16}{105}$\n",
    "\n",
    "$\\text{E}[\\rho^*(P)] = 1 \\cdot \\frac{1}{3} + 2 \\cdot \\frac{8}{21} + 3 \\cdot \\frac{16}{105} + \\dots \\approx 2.21 $"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*c*__. Argue that the following string-matching algorithm correctly finds all occurrences of pattern $P$ in a text $T[1 \\dots n]$ in time $O(\\rho^*(P)n + m)$:\n",
    "\n",
    "> ```\n",
    "REPETITION-MATCHER(P, T)\n",
    " 1  m = P.length\n",
    " 2  n = T.length\n",
    " 3  k = 1 + \\rho^*(P)\n",
    " 4  q = 0\n",
    " 5  s = 0\n",
    " 6  while s <= n - m\n",
    " 7      if T[s + q + 1] == P[q + 1]\n",
    " 8          q = q + 1\n",
    " 9          if q == m\n",
    "10               print \"Pattern occurs with shift\" s\n",
    "11      if q == m or T[s + q + 1] != P[q + 1]\n",
    "12          s = s + max(1, ceil(q/k))\n",
    "13          q = 0\n",
    "```\n",
    "\n",
    "> This algorithm is due to Galil and Seiferas. By extending these ideas greatly, they obtained a linear-time string-matching algorithm that uses only $O(1)$ storage beyond what is required for $P$ and $T$."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
